Homework 8 - Naiara Alonso Montes¶

Exercise 1¶

All metric but diameter have been calculated going through the graphs, sorting by number of nodes and getting the metrics for a single component, in this case the one with more nodes.

The diameter has been computing using the function diameter from library networkx.

Metric words.5all.txt words.5common.txt words5.txt
Number of nodes 15,920 1,382 1,044
Number of edges 51,290 958 556
Number of connected components 1,116 677 620
Largest Connected Component (nodes) 14,556 355 56
Largest Connected Component (edges) 51,021 518 71
Longest path using DFS (nodes) 8,905 65 12
Width of DFS tree 34 8 8
Longest path using BFS (nodes) 19 26 12
Width of BFS tree 14 8 8
Diameter of the largest component 24 36 14

Some conclusions that can be obtained from the results:

  • The files words.5common.txt and words5.txt have a "similar" nodes, which means "similar" number of words. On contrast, the number of edges is almost the half, and also almost equal number of connected components. This means that words in words.5common.txt are closer in relation between them in opose of words5.txt.
  • In general, a larger diameter indicates a more spread of the data and also less densely connected componemts. This can be seen in words.common.txt where the largest component has the higest value among the other word files. It means thay, despite the fact that it has been previously said that even though the whole graph is closer in relation, the largest component is not.

Exercise 2¶

My implementation takes 2 random words from the largest connected component and returns, step by step the path. Assuming that they are in the same component will lead always to a path, so there ir no need to check if they belong to the same connected component. This is the example of one execution:

Shortest path from 'think' to 'crack' using NetworkX with weights:
think -> thick (weight: 1)
thick -> trick (weight: 1)
trick -> track (weight: 1)
track -> crack (weight: 1)
Path length: 4, path weight: 4

The weight between consecutive words is 1, so the total weight is the sum of steps.

To the question Can you find any example, where the more costly 2 or 3-letter changes would be cheaper than a path of changing only one letter at any time only?, the answer form my point of view is that, based on the construction of the graph it is impossible to reach directly the 2/3 letter changes. Anyways, if we assume that the change is 3 and there exists a direct edge connecting both nodes, the weight will be $2 ⋅3 - 1 = 5$, which is greater than the distance $3⋅(2 ⋅ 1 - 1) = 3$, so at the end, if we only consider the weight as the cost from one node to other, it is more efficient to perform more number of steps.

Puzzle game¶

From shore -> skirt (hand-made)

  • shore -> short
  • short -> shirt
  • shirt -> skirt

From shore -> skirt (code)

Shortest path from 'shore' to 'skirt' using NetworkX with weights:
shore -> short (weight: 1)
short -> shirt (weight: 1)
shirt -> skirt (weight: 1)
Path length: 3, path weight: 3

Same result obtained (even it might seem that I took a look at the results, the truth is that I did all by myself :))

Exercise 3¶

I modified, the weight function so, in the first image the specific color is set to black and in the second, the specific color is set to white. Here are the results:

mapYellow_graph_dijkstra.png

mapYellow_graph_dijkstra.png

There are some differences, not so many.

Then, I put the original image in negative (opose colors), with a neutra weight color, for example, green, to see if the path presents many diferences. Here are the results:

mapYellow_graph_astar.png

mapWhite_graph_astar.png

The conclusion for this obtained paths is that, for the same weight functions, the color of the image affects, as the paths are weight based following the function A*. Both have the sume number of steps (4) but different choices based on the weights.

Exercise 4¶

The following images represent Dikjstra and A* search algorithms. The brigthness of the dots represent the order on which they have been visited. The source is represented in red and the target in yellow.

mapYellow_dijkstra_front.png

This is Dijkstra

mapYellow_astar_front.png

And this is A*

The reason why A* is less-spreaded than Dijkstra is because it uses weights to focus the search.

Exercise 5¶

This is the obtained MST using Kruskal algorithm, the source is highlighted in red (same source as the other examples).

mapYellow_mst.png

Exercise 6 (bonus)¶

I decided to choose article "Machine Learning Based Heuristic Search Algorithms to Solve Birds of a Feather Card Game" by Bryon Kucharski, Azad Deihim, Mehmet Ergezer. The reason why I choose this one is not based on the length of the paper, it is because after taking look at all sugested articles, I think that it is easier to understand and it is also related to machine learning concepts that I am currently learning.

The most optimisting schedule for reading and sumarizing the article would be finishing with the reading by Thursday/Friday as they are the days were I have more time to focus on homeworks and finishing the summary by Saturday.

In [ ]:
!jupyter nbconvert --to html HW8.ipynb